Structural bioinformatics bbcontacts: prediction of b-strand pairing from direct coupling patterns
نویسندگان
چکیده
Motivation: It has recently become possible to build reliable de novo models of proteins if a multiple sequence alignment (MSA) of at least 1000 homologous sequences can be built. Methods of global statistical network analysis can explain the observed correlations between columns in the MSA by a small set of directly coupled pairs of columns. Strong couplings are indicative of residue-residue contacts, and from the predicted contacts a structure can be computed. Here, we exploit the structural regularity of paired b-strands that leads to characteristic patterns in the noisy matrices of couplings. The b–b contacts should be detected more reliably than single contacts, reducing the required number of sequences in the MSAs. Results: bbcontacts predicts b–b contacts by detecting these characteristic patterns in the 2D map of coupling scores using two hidden Markov models (HMMs), one for parallel and one for antiparallel contacts. b-bulges are modelled as indel states. In contrast to existing methods, bbcontacts uses predicted instead of true secondary structure. On a standard set of 916 test proteins, 34% of which have MSAs with<1000 sequences, bbcontacts achieves 50% precision for contacting b–b residue pairs at 50% recall using predicted secondary structure and 64% precision at 64% recall using true secondary structure, while existing tools achieve around 45% precision at 45% recall using true secondary structure. Availability and implementation: bbcontacts is open source software (GNU Affero GPL v3) available at https://bitbucket.org/soedinglab/bbcontacts Contact: [email protected] or [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.
منابع مشابه
Bbcontacts: Prediction of Β-strand Pairing from Direct Coupling Patterns
MOTIVATION It has recently become possible to build reliable de novo models of proteins if a multiple sequence alignment (MSA) of at least 1000 homologous sequences can be built. Methods of global statistical network analysis can explain the observed correlations between columns in the MSA by a small set of directly coupled pairs of columns. Strong couplings are indicative of residue-residue co...
متن کاملBayesian Protein Structure Prediction
An important role for statisticians in the age of the Human Genome Project has developed in the emerging area of “structural bioinformatics”. Sequence analysis and structure prediction for biopolymers is a crucial step on the path to turning newly sequenced genomic data into biologically and pharmaceutically relevant information in support of molecular medicine. We describe our work on Bayesian...
متن کاملKScons: a Bayesian approach for protein residue contact prediction using the knob-socket model of protein tertiary structure
MOTIVATION By simplifying the many-bodied complexity of residue packing into patterns of simple pairwise secondary structure interactions between a single knob residue with a three-residue socket, the knob-socket construct allows a more direct incorporation of structural information into the prediction of residue contacts. By modeling the preferences between the amino acid composition of a sock...
متن کاملAccurate computational prediction of the transcribed strand of CRISPR non-coding RNAs
MOTIVATION CRISPR RNAs (crRNAs) are a type of small non-coding RNA that form a key part of an acquired immune system in prokaryotes. Specific prediction methods find crRNA-encoding loci in nearly half of sequenced bacterial, and three quarters of archaeal, species. These Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) arrays consist of repeat elements alternating with specifi...
متن کاملDAFS: simultaneous aligning and folding of RNA sequences via dual decomposition
MOTIVATION It is well known that the accuracy of RNA secondary structure prediction from a single sequence is limited, and thus a comparative approach that predicts a common secondary structure from aligned sequences is a better choice if homologous sequences with reliable alignments are available. However, correct secondary structure information is needed to produce reliable alignments of RNA ...
متن کامل